Life aboard an 18th-century British sailing ship left much to be desired. Salt caked your clothes. Rats shared your food. Hard, sun-baked days retreated into cold, damp nights. You could fall off a foremast , blow over a bow, drown in the deep, succumb to scurvy, or be vanquished by venereal disease. All the while, you counted. You counted the dawns at sea and the stars at dusk. You counted knots, fathoms, and degrees. Ocean crossings became makeshift research studies, where sailors quantified the lengthy distances between ports and the even longer durations between paychecks.
Naval voyages could take months.1 To discourage sailors from deserting, ship’s captains often suspended sailors’ wages. The promise of future earnings encouraged love-struck seamen to re-board their ships after long nights of frolicking in foreign harbor towns.
The 1790 diary of George Hodge2 chronicled his career in the Royal Navy, during which time he waited 17 years to be paid his full wages. In the interim, he and his fellow sailors were given a “tot,” a daily ration of rum. Both buyers and sellers valued rum. Its value lay in its alcohol . Unscrupulous sellers would dilute a cask of rum with water, reducing both its alcohol content and its value. In response, buyers devised ingenious ways to test rum’s quality prior to purchasing it.
The phrase “keep your powder dry” originates from a warning issued to sailors and soldiers.3 Once gunpowder becomes wet, it will not ignite. Muskets will not shoot. Cannons will not fire. This is true in all cases save one: gunpowder will ignite in a mixture of water and alcohol, but only when the percentage of alcohol is high enough to counter water’s extinguishing effects.
Sailors would sprinkle gunpowder over a small pool of rum and then attempt to set it ablaze. Watered-down rum soaked the gunpowder and would not ignite. It fizzled , instead. But at 57% alcohol,4 magic happened: the rum burned. A brilliant flash indicated a high percentage of alcohol. Rum that burned gave sailors proof of its quality—the rum was “proofed.” This age-old measurement is why we see liquor bottles labeled with their proofs, even today.
Our present-day means of measuring things are more accurate, but the goal is the same: we measure to prove. Like untrusting sailors testing a cask of rum, we wait to see the flash or fizzle before we declare our success or failure.
We now swim in a sea of data. Quantitative research provides us with a means to navigate it. It measures the world through numerical and statistical analyses. It reports budgets, records populations, and measures speeds. What is the average cost of a U.S. aircraft carrier? Where are women-owned firms flourishing? How long does it take for users to check out? On the surface, such data denotes little information other than numbers . But further analysis uncovers additional insights. Soaring budgets may signal a rising commodity market. Successful economic zones may indicate an advantageous tax policy. Long checkout times may reveal problems with a website’s shopping cart.
Each measurement quantifies data and shapes our research, proving our success or failure. Where we once guessed people’s behavior, we now can track their every click, tap, and swipe. Yet, research looks backward; we see the wake of the ship , but never what lies ahead. We cannot predict the future with certainty, but we can measure which direction the wind is blowing.
Significance
When we delve deeper into quantitative research, we discover that what we are really talking about is significance. Which data aides our decision making and which are merely paper and pixels? A researcher can endlessly record and analyze the world—but to what end? For quantitative research to be useful, it must be practically and statistically significant.
Before you jump overboard, know that we will only skim the surface of statistics here. We will cover the basics while avoiding the details that make math professors rejoice and grad students cry.
Let us start by defining a few terms.
A population is the entirety of a data set, be it a population of English sailors , flying fish, or rum barrels. A population includes every sailor, fish, or barrel—not just the big ones; not just the small ones; not just the ones we want to include. Every single one.
A sample’s average alcohol proof is a statistic (e.g., 74.6 proof).
Good statistics are generalizable, meaning the statistic can be used to infer conclusions about an entire population . We say the average alcohol proof of a few cups of rum represents the average alcohol proof of all rum barrels. Generalized statistics are not infallible ; they do not always lead to exact matches when extending our research across an entire population. Our sample may indicate an average alcohol proof of 74.6, but a few rum barrels might be watered down, while others might put hair on your chest.
Reliability describes how often a test produces similar measurements under similar conditions. Testing the height of barrels is reliable, as barrels tend to stay the same height over time . By contrast, rum’s color is not reliable, because its color fluctuates depending on any number of factors, such as the rum’s age and its means of storage .
Validity signifies the accuracy of research. From overall conclusions to individual measurements, we want all research efforts to be valid. Truth be told, lighting rum on fire is not an accurate means to measure alcohol content. A modern-day hydrometer would provide a much more valid measurement. But what fun is that?
Luckily, we have many software tools to analyze populations, samples, and statistics. They do much of the work for us. However, knowing how to analyze data allows us to interpret the resulting information and recognize if it is reliable and valid.
We will discuss issues that affect reliability and validity in the next chapter. For now, let us touch on the primary cause of quantitative research problems : sampling bias . Here we make the error of selecting a non-random sample, thereby affecting our ability to generalize subsequent research findings. Perhaps we select only the rum barrels stored below deck and do not account for the barrels slowly evaporating under the hot sun . Our sample would not be representative of the population. The same may occur when conducting surveys and other quantitative research. We inadvertently select only those people who wish to respond, skipping those who are too busy, uninterested, or unable to answer our inquiries. We miss busy moms, apathetic teens, older adults, non-English speakers, low-income audiences, and people with disabilities.
Assuming we can avoid the perils of sampling bias, we have several ways to collect quantitative data. We collect it through polls, questionnaires, surveys, A/B tests, web analytics, and search logs. Though the methods differ, each attempts to describe data .
1,000,000 users visit the Fishes’R’Us website every year (population)
100,000 users visited the Fishes’R’Us website last month (sample)
3,000 users from the sample completed their purchase (statistic)
If we were to measure this sample of 100,000 taken from a population of 1,000,000, you could say that the website converted 3% of its visitors last month . The conversion rate is a statistical mean, averaging all visits from within the sample. Some visits led to a purchase. Many others did not.
A 3% conversion rate is quite good, but what if we wanted to improve it? We could experiment with a free shipping incentive. Our hypothesis: By adding free shipping, we will increase our conversion rate . So, we could run an A/B test, which is a simple comparison of two variants.
50,000 users are NOT offered free shipping (sample)
1,500 users complete their purchase (statistic)
50,000 users are offered free shipping (sample)
2,500 users complete their purchase (statistic)
In Version A, our sample returned results that align with those in our previous test—still 3%. No surprise . With Version B, the website had a whopping 5% conversion rate, indicating a correlation between free shipping and conversions. If our test was reliable and valid, we could infer the incentive would be equally as compelling across the entire population.
Yet, quantitative research only discovers what happened; it does not explain why something happened. It describes and implies . Its findings are neither complete nor certain. We are not fishing from a barrel.
Correlation and Causality
In the late, cool evening of April 14, 1912, lookouts on the deck of the steamship RMS Titanic spotted an iceberg off their starboard bow . Alarms were rung, orders were issued, and engines were reversed. We know the rest of the story . However, you may be surprised to learn about a remarkable coincidence, and what it shares with software design.
Among the Titanic’s survivors was a stewardess named Violet Jessop.8 In her 24 years, she had already endured much before the ship’s sinking, including what must have seemed like a warm-up act to her Titanic voyage. Through an unfortunate stroke of luck , she had just seven months earlier been a crewmember on the doomed RMS Olympic, which nearly sunk off the Isle of Wight as the result of a collision with the warship HMS Hawke.9 Violet Jessop must have felt a palpable sense of déjà-vu to once again find herself on board an ill-fated vessel when the hospital ship Britannic struck a mine and sank into the Aegean Sea . To survive one maritime disaster is harrowing. Two is unusual. But three is remarkable.
You might think that Violet would reconsider her choice of profession after being on board three sinking ships . However, she continued to work for cruise and shipping companies throughout her career . Despite an amazing level of coincidence, Violet Jessop had no bearing on the three events. She merely had a hapless employment history. Violet did not cause a single shipwreck, let alone all three.
Our adventures in quantitative research are not nearly as hazardous, but we do witness coincidences on a regular basis. Sales briefly increase. Page views temporarily decline. “Likes” momentarily stagnate. These behaviors are noticeable , but are they notable? Coincidences experienced during a project can often mislead us into making reactionary and shortsighted decisions. We will discuss three common hazards of research that lurk beneath the surface of your projects and sink good ideas.
Texas Sharpshooter Fallacy
Imagine for a moment a gritty cowboy , the type of fella that might have a mouthful of chewing tobacco and hips flanked by two Colt .45 revolvers. A western sun offsets his dusty silhouette , as tumbleweeds blow by in the distance. Our cowboy stands motionless, guns at the ready, staring with focused attention at an old, wooden barn standing several yards away. He spits, raises his revolvers, and quickly fires 12 shots.
As the dust clears, we see bullet holes scattered across the barn’s wooden wall in no apparent order or pattern. A few shots hit near the center of the wall. Some hit near the roof. Others hit near the foundation. The cowboy walks up to the barn, pulls out a piece of chalk from his pocket, and draws a single, continuous line around all the bullet holes . His drawing forms a large, weirdly shaped outline. Upon its completion, the cowboy exclaims, “Well, look’y here. All my shots hit the target!”
We can all be Texas sharpshooters if we do not carefully evaluate the entirety of the available data. Simply looking for clusters that align with our biases may lead us to incorrect conclusions .
For example, the review of a website’s analytic information serves as an excellent resource to evaluate past performance. However, we can use analytics to predict future performance only if the website stays the same, devoid of any design or technical changes. To do otherwise would be like trying to count old bullet holes in a new barn . Once you introduce changes to an experience, analytic information becomes purely historical. Until you accumulate a sufficient mass of new information, analytics are irrelevant. New barns only show new bullet holes. Even then, you still might draw the wrong target.
Acme Company changes their website’s home page and wants to evaluate its aesthetic merits. They measure the number of visits. After making the change to the home page , fewer visitors view the page. Therefore, Acme Company believes the new design is less successful than the previous one.
In this example, Acme Company counts bullets (the number of visits) on the target (the site’s home page). Outside of search engine optimization, the number of visits rarely has anything to do with a page’s visual design. After all, a visitor could view the page and say, “I think this home page looks horrible,” and then leave . However, analytics software still counts his or her visit. A page visit is an ineffective means of evaluating visual design. The number of visits reflect market awareness and supporting media efforts, but not the page’s visual design. Acme counted bullets but chose the wrong target.
Paint your target, then count the bullet holes. You will be a sharpshooter in no time .
Procrustean Bed
If the Texas sharpshooter fallacy exuded a certain country charm, the story of the Procrustean bed should scare the hell out of you. According to Greek mythology,10 an old ironsmith named Procrustean would offer shelter to weary travelers along the road to Athens . While they slept, Procrustean would strap the travelers to their beds and stretch their bodies to fit the bed frame. Short people got off easy. The tall ones truly suffered. Procrustean chopped off their feet, ankles, and shins until the travelers fit neatly into their beds.
You find Procrustean solutions frequently in quantitative research. Data is stretched and truncated to meet a chosen outcome . Business objectives are overplayed; user needs are downplayed. Device requirements are overplayed; affordability is downplayed. Gesture controls are overplayed; the aging population is downplayed. Stretch. Chop. Enhance. Remove. We become data sadists.
We also affect data while collecting it. Selection bias stretches and pulls data by altering whom or what we select as the data’s source. Research trends, such as “get out of the building ” (GOOB), can be a powerful tool to solicit feedback from users . Here, we leave our offices and visit a public setting. We find users and show them an app or website, engaging and testing how the audience responds . However, like Procrustean sizing up his guests on the road to Athens, we may inadvertently—or intentionally—select users based on non-representative criteria. We subconsciously select people who look friendly, relaxed, and outgoing. On-the-street interviews, retail intercepts, and all face-to-face interactions carry the possibility that we may reach only those people who are willing to talk to us. Are they representative of your audience, or are they only representative of people willing to talk to an inquisitive stranger holding an iPad?
Keep a vigilant eye on data that fits a little too neatly into recommendations—even your own. Realistic assessment of data may occasionally clip your wings, but it will help you avoid getting cut off at the knees.
Hobson’s Choice
Livery stables were the 17th-century equivalent to today’s car rental companies . Riders chose a horse, rode it, and then returned it. Thomas Hobson11 ran a livery stable outside of Cambridge, England. He realized that riders chose the good horses far more often than the bad , resulting in the overuse of some horses and the underuse of others. Like automobiles, horses accrue mileage. Hobson decided to eliminate the rider’s choice. He gave prospective riders a single option: ride the horse I choose for you or do not ride at all. In short, “take it or leave it.”
We often face a Hobson’s choice when researching, designing, and building software. We accept a bad solution rather than go without. A study does not include enough participants ; an experience feels awkward; an app’s performance trots rather than gallops. However, your team employs the solution anyway. Short schedules and insufficient budgets often take the blame.
If a solution were bad, it would be best to not take it out of the stable, so to speak. In today’s world of rapid iteration, we sometimes accept a Hobson’s choice solution in the hope that eventually it will be replaced. We emphasize the now over the good at our peril . As the saying goes: “The joy of an early release lasts but a short time. The bitterness of an unusable system can last for years.”12
Researching, brainstorming, designing, developing, scheduling, budgeting, and managing generates a lot of horse shit. You need to find a way to stomp through it and reach the road leading to your audience . Avoid the hazards along the way. Recognize coincidence, pick your targets, and always be wary of strange, old men offering help—including me.
Key Takeaways
Quantitative research involves numerical and statistical analyses.
Quantitative research provides the “what” about a phenomenon.
Useful quantitative research is statistically significant.
Good statistics are generalizable and can be used to infer conclusions about an entire population.
When collecting data, make sure research subjects are representative of a population.
Correlation is not causality!
Questions to Ask Yourself
If the research were repeated, how often would it produce similar results?
How accurate is the research data?
Is our research data representative of a population?
Am I inadvertently selecting people from a population who are like me?
Am I accounting for people who do not respond to a survey?
Am I mistaking a correlation for a causality ?
Have I clearly stated my research objective before conducting research?
Am I ignoring differences or overemphasizing similarities within the research data?
Am I stretching data to meet my client’s , my team’s, or my own needs?
Is a timeline or budget affecting my objectivity?